Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models Citation

نویسندگان

  • Xiaogang Wang
  • Xiaoxu Ma
  • Eric L. Grimson
چکیده

We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Under our framework, hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple “atomic” activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multiagent interactions are modeled as distributions over atomic activities. These models are learned in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models: the Latent Dirichlet Allocation (LDA) mixture model, the Hierarchical Dirichlet Processes (HDP) mixture model, and the Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing topic models, such as LDA [1] and HDP [2]. Directly using existing LDA and HDP models under our framework, only moving pixels can be clustered into atomic activities. Our models can cluster both moving pixels and video clips into atomic activities and into interactions. The LDA mixture model assumes that it is already known how many different types of atomic activities and interactions occur in the scene. The HDP mixture model automatically decides the number of categories of atomic activities. The DualHDP automatically decides the numbers of categories of both atomic activities and interactions. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of broad interest such as: 1) discovering and providing a summary of typical atomic activities and interactions occurring in the scene, 2) segmenting long video sequences into different interactions, 3) segmenting motions into different activities, 4) detecting abnormality, and 5) supporting high-level queries on activities and interactions. In our work, these surveillance problems are formulated in a transparent, clean, and probabilistic way compared with the ad hoc nature of many existing approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Action Recognition Using Topic Models

In this book chapter, we will introduce approaches of using topic models for action recognition. Topic models were originally developed in language processing. In recent years, they were applied to action recognition and other computer vision problems, and achieved great success. Topic models are unsupervised. The models of actions are learned through exploring the co-occurrence of visual featu...

متن کامل

Traffic Scene Analysis using Hierarchical Sparse Topical Coding

Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...

متن کامل

Video anomaly detection based on a hierarchical activity discovery within spatio-temporal contexts

In this paper, we present a novel approach for video-anomaly detection in crowded and complicated scenes. The proposed approach detects anomalies based on a hierarchical activity-pattern discovery framework, comprehensively considering both global and local spatio-temporal contexts. The discovery is a coarse-to-fine learning process with unsupervised methods for automatically constructing norma...

متن کامل

Learning Sparse Prototypes via Ensemble Coding Mechanisms for Crowd Perception

This paper presents a novel approach to learning a dictionary of crowd prototypes for dynamic visual scenes. Recent work in cognitive psychology suggests that crowd perception may be based on pre-attentive ensemble coding mechanisms [24] in the spirit of feedforward hierarchical models of visual processing [4]. We extend a biological model of motion processing [10] with a new dictionary learnin...

متن کامل

Online multiple people tracking-by-detection in crowded scenes

Multiple people detection and tracking is a challenging task in real-world crowded scenes. In this paper, we have presented an online multiple people tracking-by-detection approach with a single camera. We have detected objects with deformable part models and a visual background extractor. In the tracking phase we have used a combination of support vector machine (SVM) person-specific classifie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009